Loading Census Data From Census Reporter

Load data directly from the Census Reporter API, but use the columns titles for names, rather than the column id numbers.

Downloading data from the Census Reporter API returns both data and metadata.

To select an area to get results for, you must provide both a geoid for the containing area, and a summary level for the subdivisions.

To get the summary level, see this spreadsheet of summary levels.. You can get the geoids by browsing and searching at the Census Reporter website


In [77]:
import requests #  run '!pip install requests' in the Notebook to install
import pandas as pd

# Get the Age by sex table, and a lot of metadata, directly from the Census Reporter API
# for all tracts in California

table_id = 'B01001'
summary_level = '140' # Tracts
geoid='05000US06073' # San Diego County

data = requests.get("http://api.censusreporter.org/1.0/data/show/latest"
                    "?table_ids={table_id}&geo_ids={sl}|{geoid}"
                    .format(table_id=table_id, sl=summary_level, geoid=geoid)).json()

In [78]:
from operator import itemgetter

# It looks like the JSON dicts may be properly sorted, but I'm not sure I can rely on that.
# So, sort the column id values, then make a columns title list in the same order
columns = data['tables']['B01001']['columns']
columns_id_names = sorted(columns.keys() )
column_titles = [ columns[e]['name'] for e in columns_id_names]

# The item getter will produce a tuple from a dict, getting all of the 
# dict items in the order of the keys named in the argument list. In this case, 
# it turns the dict into a tuple of estimate values
row_ig = itemgetter(*columns_id_names)

tracts = data['data'].keys()

# Actually get the row data. 
estimate_rows = [ row_ig(data['data'][tract]['B01001']['estimate']) for tract in tracts ]
error_rows = [ row_ig(data['data'][tract]['B01001']['error']) for tract in tracts ]
# Now, creating the dataframe is really easy. 
df = pd.DataFrame(estimate_rows, columns=column_titles)
df.head()


Out[78]:
Total: Male: Under 5 years 5 to 9 years 10 to 14 years 15 to 17 years 18 and 19 years 20 years 21 years 22 to 24 years ... 50 to 54 years 55 to 59 years 60 and 61 years 62 to 64 years 65 and 66 years 67 to 69 years 70 to 74 years 75 to 79 years 80 to 84 years 85 years and over
0 2696.0 1410.0 54.0 77.0 108.0 89.0 30.0 10.0 7.0 89.0 ... 89.0 114.0 49.0 48.0 13.0 34.0 76.0 24.0 11.0 0.0
1 5285.0 2366.0 96.0 211.0 217.0 81.0 59.0 116.0 12.0 135.0 ... 185.0 166.0 89.0 72.0 87.0 49.0 71.0 20.0 16.0 60.0
2 7406.0 3826.0 339.0 188.0 249.0 109.0 106.0 187.0 83.0 273.0 ... 172.0 135.0 120.0 0.0 63.0 52.0 95.0 130.0 68.0 16.0
3 5429.0 2627.0 273.0 189.0 171.0 89.0 109.0 61.0 198.0 207.0 ... 243.0 169.0 28.0 19.0 34.0 115.0 82.0 28.0 75.0 18.0
4 4692.0 2231.0 136.0 126.0 103.0 54.0 80.0 46.0 17.0 47.0 ... 224.0 175.0 59.0 36.0 69.0 92.0 131.0 37.0 9.0 19.0

5 rows × 49 columns

Better Column Headings

The prior code works but results in headings that don't distinguish between female and male, so wee need to alter the headings to use the column groupings.


In [75]:
new_titles = []
last_heading = ''
for t in column_titles:

    t = t.replace('years','')
    
    if t.endswith(':'):
        t = t.strip(':')
        new_titles.append(t)
        last_heading = t
    else:
        new_titles.append(last_heading+' '+t)
        
df = pd.DataFrame(estimate_rows, columns=new_titles)  
df.head()


Out[75]:
Total Male Male Under 5 Male 5 to 9 Male 10 to 14 Male 15 to 17 Male 18 and 19 Male 20 Male 21 Male 22 to 24 ... Female 50 to 54 Female 55 to 59 Female 60 and 61 Female 62 to 64 Female 65 and 66 Female 67 to 69 Female 70 to 74 Female 75 to 79 Female 80 to 84 Female 85 and over
0 4530.0 2225.0 103.0 185.0 140.0 47.0 48.0 51.0 58.0 129.0 ... 131.0 73.0 74.0 19.0 7.0 17.0 22.0 6.0 35.0 21.0
1 5429.0 2627.0 273.0 189.0 171.0 89.0 109.0 61.0 198.0 207.0 ... 243.0 169.0 28.0 19.0 34.0 115.0 82.0 28.0 75.0 18.0
2 3580.0 1750.0 194.0 70.0 95.0 59.0 28.0 8.0 0.0 62.0 ... 180.0 151.0 82.0 17.0 23.0 55.0 77.0 72.0 111.0 79.0
3 4099.0 2133.0 165.0 71.0 37.0 72.0 17.0 13.0 0.0 118.0 ... 102.0 59.0 19.0 50.0 60.0 76.0 33.0 0.0 17.0 61.0
4 6064.0 3146.0 138.0 97.0 80.0 45.0 0.0 8.0 38.0 321.0 ... 85.0 132.0 79.0 48.0 42.0 50.0 41.0 42.0 26.0 17.0

5 rows × 49 columns


In [76]:
df[:5].T


Out[76]:
0 1 2 3 4
Total 4530.0 5429.0 3580.0 4099.0 6064.0
Male 2225.0 2627.0 1750.0 2133.0 3146.0
Male Under 5 103.0 273.0 194.0 165.0 138.0
Male 5 to 9 185.0 189.0 70.0 71.0 97.0
Male 10 to 14 140.0 171.0 95.0 37.0 80.0
Male 15 to 17 47.0 89.0 59.0 72.0 45.0
Male 18 and 19 48.0 109.0 28.0 17.0 0.0
Male 20 51.0 61.0 8.0 13.0 8.0
Male 21 58.0 198.0 0.0 0.0 38.0
Male 22 to 24 129.0 207.0 62.0 118.0 321.0
Male 25 to 29 445.0 217.0 96.0 330.0 646.0
Male 30 to 34 243.0 180.0 64.0 315.0 509.0
Male 35 to 39 244.0 81.0 147.0 240.0 227.0
Male 40 to 44 103.0 86.0 126.0 121.0 252.0
Male 45 to 49 136.0 95.0 105.0 160.0 147.0
Male 50 to 54 118.0 237.0 117.0 98.0 254.0
Male 55 to 59 44.0 181.0 165.0 115.0 92.0
Male 60 and 61 37.0 31.0 42.0 8.0 0.0
Male 62 to 64 16.0 31.0 83.0 42.0 41.0
Male 65 and 66 8.0 39.0 11.0 0.0 90.0
Male 67 to 69 22.0 0.0 64.0 62.0 12.0
Male 70 to 74 15.0 63.0 38.0 42.0 82.0
Male 75 to 79 7.0 26.0 54.0 26.0 26.0
Male 80 to 84 20.0 30.0 35.0 65.0 24.0
Male 85 and over 6.0 33.0 87.0 16.0 17.0
Female 2305.0 2802.0 1830.0 1966.0 2918.0
Female Under 5 203.0 312.0 116.0 77.0 99.0
Female 5 to 9 142.0 244.0 121.0 97.0 102.0
Female 10 to 14 88.0 180.0 85.0 27.0 118.0
Female 15 to 17 66.0 54.0 26.0 24.0 84.0
Female 18 and 19 81.0 91.0 30.0 16.0 36.0
Female 20 45.0 72.0 11.0 0.0 27.0
Female 21 62.0 11.0 28.0 0.0 107.0
Female 22 to 24 194.0 161.0 54.0 209.0 239.0
Female 25 to 29 355.0 183.0 73.0 452.0 526.0
Female 30 to 34 335.0 128.0 102.0 208.0 396.0
Female 35 to 39 109.0 273.0 126.0 215.0 338.0
Female 40 to 44 120.0 92.0 84.0 106.0 114.0
Female 45 to 49 100.0 190.0 127.0 58.0 170.0
Female 50 to 54 131.0 243.0 180.0 102.0 85.0
Female 55 to 59 73.0 169.0 151.0 59.0 132.0
Female 60 and 61 74.0 28.0 82.0 19.0 79.0
Female 62 to 64 19.0 19.0 17.0 50.0 48.0
Female 65 and 66 7.0 34.0 23.0 60.0 42.0
Female 67 to 69 17.0 115.0 55.0 76.0 50.0
Female 70 to 74 22.0 82.0 77.0 33.0 41.0
Female 75 to 79 6.0 28.0 72.0 0.0 42.0
Female 80 to 84 35.0 75.0 111.0 17.0 26.0
Female 85 and over 21.0 18.0 79.0 61.0 17.0

In [ ]: